Resource Constrained Multimedia Event Detection

نویسندگان

  • Zhen-Zhong Lan
  • Yi Yang
  • Nicolas Ballas
  • Shoou-I Yu
  • Alexander G. Hauptmann
چکیده

We present a study comparing the cost and efficiency tradeoffs of multiple features for multimedia event detection. Low-level as well as semantic features are a critical part of contemporary multimedia and computer vision research. Arguably, combinations of multiple feature sets have been a major reason for recent progress in the field, not just as a low dimensional representations of multimedia data, but also as a means to semantically summarize images and videos. However, their efficacy for complex event recognition in unconstrained videos on standardized datasets has not been systematically studied. In this paper, we evaluate the accuracy and contribution of more than 10 multi-modality features, including semantic and low-level video representations, using two newly released NIST TRECVID Multimedia Event Detection (MED) open source datasets, i.e. MEDTEST and KINDREDTEST, which contain more than 1000 hours of videos. Contrasting multiple performance metrics, such as average precision, probability of missed detection and minimum normalized detection cost, we propose a framework to balance the trade-off between accuracy and computational cost. This study provides an empirical foundation for selecting feature sets that are capable of dealing with large-scale data with limited computational resources and are likely to produce superior multimedia event detection accuracy. This framework also applies to other resource limited multimedia analysis such as selecting/fusing multiple classifiers and different representations of each feature set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting communities of workforces for the multi-skill resource-constrained project scheduling problem: A dandelion solution approach

This paper proposes a new mixed-integer model for the multi-skill resource-constrained project scheduling problem (MSRCPSP). The interactions between workers are represented as undirected networks. Therefore, for each required skill, an undirected network is formed which shows the relations of human resources. In this paper, community detection in networks is used to find the most compatible wo...

متن کامل

Hybrid Layered Video Encoding for Mobile Internet-Based Computer Vision and Multimedia Applications

Mobile networked environments are typically resource constrained in terms of the available bandwidth and battery capacity on mobile devices. Realtime video applications entail the analysis, storage, transmission, and rendering of video data, and are hence resource-intensive. Since the available bandwidth in the mobile Internet is constantly changing, and the battery life of a mobile video appli...

متن کامل

A Hybrid Layered Video Encoding Technique for Mobile Internet-based Vision

The increasing deployment of broadband networks and simultaneous proliferation of low-cost video capturing and multimedia-enabled mobile devices have triggered a new wave of mobile Internet-based computer vision applications. However, mobile networked environments are typically resource constrained in terms of the available bandwidth and battery capacity on mobile devices. Computer vision appli...

متن کامل

Decentralized algorithms for classifier topology optimization in large-scale multi-concept detection

Multi-concept identification in high volume multimedia streams is critical for a number of applications, including large-scale multimedia analysis, processing, and retrieval. Content of interest is filtered using a collection of binary classifiers that are deployed on distributed resource-constrained infrastructure. In this paper, we design distributed algorithms for determining the optimal top...

متن کامل

Audio self organized units for high-level event detection

High-level multimedia event detection aims to identify videos containing a target event. Recent approaches leveraging audio information for this task fall into two broad categories. The first corresponds to holistic bag-of-words approaches based on frame-level descriptors. These are effective for classification, but hard for humans to interpret. The second corresponds to approaches that build a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014